Topic Detection and Segmentation in Automatic Text Summarization∗
نویسنده
چکیده
A topic or a theme is what discourse, a discourse fragment or a sentence is about. It is the shortest summary of a discourse, the main proposition of a paragraph or what is commented on in a sentence. The term topic is usually defined as the aboutness of a unit of discourse [20]. Topic structure within a text can be tackled from two different points of view: from a Document Level (discourse topic) or a Sentence Level perspective (sentence topic). If a text is considered as a whole, it usually talks about a single topic, but in a more deeper analysis, several sub-topics can be identified, giving additional information about the main topic. On the other hand, taking into account sentence structure, we can find that every sentence has a topic (the part of the structure that is being presenting) and a comment (what is being asserted about the topic). Next, these two-way perspectives are going to be explained in more detail. Normally, a text deals with a single main topic, which is developed through the rest of the document, exposing several subtopics as well. It is important to detect these parts of text where a change of topic occurs. This leads to a hierarchical organization of a text into topics and subtopics, topic concatenation, and semantic return [1]. Text documents are usually subdivided in different paragraphs. A paragraph can be defined as a coherent text segment focused on a single topic. Every paragraph needs a topic sentence. The topic sentence is usually the first sentence of the paragraph [13]. It gives the reader an idea of what the paragraph is going to be about. The dicotomy of topic and focus is relevant not only for a possible placement of the sentence in a context, but also for its semantic interpretation [9]. The topic, also known as theme, can be understood as the part of the sentence structure that is being presented by the speaker as readily available in the hearer's memory; in other words, it is the " given " information, while the focus (comment or rheme) reflects
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملThe Use of Topic Segmentation for Automatic Summarization
Topic segmentation can be used as a preprocessing step in numerous natural language processing applications. In this short paper, we will discuss how we adapted our segmentation algorithm for automatic summarization.
متن کاملSystematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملBiogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization
Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...
متن کاملDiscovering Topic Boundaries for Text Summarization Based on Word Co-occurrence
Topic Segmentation is the task of breaking documents into topically coherent multiparagraph subparts. In particular, Topic Segmentation is extensively used in Text Summarization to provide more coherent results by taking into account raw document structure. However, most methodologies are based on lexical repetition that show evident reliability problems or rely on harvesting linguistic resourc...
متن کاملTopic Segmentation for Textual Document Written in Arabic Language
Topic segmentation is important for many natural language processing applications such as information retrieval, text summarization... In our work, we are interested in the topic segmentation of textual document. We present a survey of related works particularly C99 and TextTiling. Then, we propose an adaptation of these topic segmenters for textual document written in Arabic language named as ...
متن کامل